A Method of Automated Nonparametric Content Analysis for Social Science

نویسندگان

  • Daniel J. Hopkins
  • Gary King
چکیده

The increasing availability of digitized text presents enormous opportunities for social scientists. Yet hand coding many blogs, speeches, government records, newspapers, or other sources of unstructured text is infeasible. Although computer scientists have methods for automated content analysis, most are optimized to classify individual documents, whereas social scientists instead want generalizations about the population of documents, such as the proportion in a given category. Unfortunately, even a method with a high percent of individual documents correctly classified can be hugely biased when estimating category proportions. By directly optimizing for this social science goal, we develop a method that gives approximately unbiased estimates of category proportions even when the optimal classifier performs poorly. We illustrate with diverse data sets, including the daily expressed opinions of thousands of people about the U.S. presidency. We also make available software that implements our methods and large corpora of text for further analysis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Method for Automated Estimation of Effective Parameters of Complex Auditory Brainstem Response: Adaptive Processing based on Correntropy Concept

Objectives: Automated Auditory Brainstem Responses (ABR) peak detection is a novel technique to facilitate the measurement of neural synchrony along the auditory pathway through the brainstem. Analyzing the location of the peaks in these signals and the time interval between them may be utilized either for analyzing the hearing process or detecting peripheral and central lesions in the human he...

متن کامل

Novel Automated Method for Minirhizotron Image Analysis: Root Detection using Curvelet Transform

In this article a new method is introduced for distinguishing roots and background based on their digital curvelet transform in minirhizotron images. In the proposed method, the nonlinear mapping is applied on sub-band curvelet components followed by boundary detection using energy optimization concept. The curvelet transform has the excellent capability in detecting roots with different orient...

متن کامل

Analysis of Users’ Opinions about Reasons for Divorce

One of the most important issues related to knowledge discovery is the field of comment mining. Opinion mining is a tool through which the opinions of people who comment about a specific issue can be evaluated in order to achieve some interesting results. This is a subset of data mining. Opinion mining can be improved using the data mining algorithms. One of the important parts of opinion minin...

متن کامل

Gender-based Differences in Associations between Attitude and Self-esteem with Smoking Behavior among Adolescents: A Secondary Analysis Applying Bayesian Nonparametric Functional Latent Variable Model

Background: Different patterns of gender-based relationships between attitude toward smoking and self-esteem with smoking behavior have reported. However, such associations may be much more complex than a simply supposed linear relationship. We aimed to propose a method of providing hand details on the total and gender-based scenarios of the relationships between attitude toward smoking and sel...

متن کامل

Incorporating nonparametric statistics into Delphi studies in library and information science

Introduction. The Delphi technique is widely used in library and information science research. However, many researchers in the field fail to employ standard statistical tests when using this technique. This makes the technique vulnerable to criticisms of its reliability and validity. The general goal of this article is to explore how nonparametric statistical techniques could mitigate this dra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009